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A new self-learning algorithm for accelerated dynamics, reconnais- 
sance mctadynamics, is proposed that is able to work with a very 
large number of collective coordinates. Acceleration of the dynamics 
is achieved by constructing a bias potential in terms of a patchwork 
of one-dimensional, locally valid collective coordinates. These col- 
lective coordinates are obtained from trajectory analyses so that they 
adapt to any new features encountered during the simulation. We 
show how this methodology can be used to enhance sampling in real 
chemical systems citing examples both from the physics of clusters 
and from the biological sciences. 
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Introduction 

Many chemical systems, notably those in condensed mat- 
ter and in biology, are characterized by the presence of 
multiple low energy states which are separated by large barri- 
ers. The presence of these barriers prevents exploration of all 
of configuration space during the relatively-short timescales 
accessible in molecular dynamics (MD) simulations. Typi- 
cally this means that only those configurations in a small, 
locally-ergodic region in the vicinity of the input structure 
are visited. 

A large number of methods have been put forward to over- 
come this difficulty, many of which either use some form of en- 
hanced sampling (1-6) or focus on the transition from one lo- 
cal minimum to another (7-10). Other methods recognize that 
a small number of degrees of freedom (collective variables) ac- 
curately describe the interesting transitions and so either raise 
the temperature of these degrees of freedom (11-13) or intro- 
duce a bias to enhance the sampling along them (14-22). The 
major differences between the various approaches in this last 
class is the way in which the bias is generated a particularly 
useful technique being to use a bias that is dependent on the 
previously visited trajectory. This is the basis of the metady- 
namics method (23,24) that has been introduced by our group 
and applied to a large variety of chemical problems (25). 

For many chemical systems the interesting, reactive pro- 
cesses take place in a relatively low dimensional space (26,27). 
However, it is often not immediately obvious how to identify 
a set of collective variables (CVs) that span this 'reactive' 
sub-space. Furthermore, in methods like metadynamics the 
presence of barriers in the transverse degrees of freedom leads 
to incomplete sampling. The obvious solution therefore is to 
use very large numbers of collective coordinates. However, 
although this is theoretically possible with methods such as 
metadynamics, it is impractical because the volume of bias 
that one must add to the free energy surface, and hence the 
length of the simulation, increases exponentially with dimen- 
sionality. One suggestion for resolving this issue is to run 
multiple short, ID metadynamics simulations in parallel with 
different collective coordinates and to allow swaps between the 
different realizations based on a Monte Carlo criterion (28). 
Here we propose an alternative solution based on the real- 
ization that, if the free energy surface is to be flattened, the 
majority of the bias will have to be added at or near the 
minima in the surface. Identifying the locations of these min- 



ima is straightforward as, during a dynamical trajectory, the 
system should spend the majority of its time trapped in the 
vicinity of one or more of them (29). Therefore by using a 
form of "smart" bias that targets these low free energy re- 
gions specifically we can force the system away from them 
and into unexplored areas of configuration space. We call the 
self learning algorithm that we have developed based on these 
ideas Reconnaissance Metadynamics and have implemented 
it in the plugin for molecular dynamics PLUMED (24). In 
what follows we demonstrate this algorithm on two different 
systems - a cluster of seven Lennard Jones atoms and a short 
protein. 



Background 

Before introducing our new method a brief survey of estab- 
lished techniques for dealing with complex energy surfaces in 
terms of very large numbers of collective coordinates is in or- 
der. Zhu et al (30, 31) have introduced a method that uses 
a variable transformation to reduce barriers and thereby in- 
crease sampling, which works with a large number of collec- 
tive coordinates. Problematically though this method does 
not work if there are hydrogen bonds present. In contrast, 
methods that work by thermostating the CVs at a higher tem- 
perature (11-13) suffer no such problems and have been used 
successfully with large numbers of CVs (32). However in these 
methods, unlike metadynamics, there is nothing that prevents 
the system from revisiting configurations, which could prove 
problematic for examinations of glassy landscapes with many 
local minima of equal likelihood. 

For many free energy methods explicitly including a large 
numbers of collective coordinates is not feasible. However, one 
can use collective coordinates that describe a collective mo- 
tion that involves many degrees of freedom. For example, one 
can use the principle components of the covariance matrix of 
a large set of collective coordinates, the values of which have 
been calculated over a short MD trajectory (33,34). Alter- 
natively, one can take non-linear combinations by defining a 
path in the high-dimensionality CV space (35). The distance 
along and the distance from this path span a low-dimensional, 
non-linear space and metadynamics simulations using these 
two CVs have been shown to work well. However, a great 
deal of insight is required in choosing an initial path from 
which the CVs are generated as the simulation can only pro- 
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vide meaningful insight for the region of configuration space 
in the immediate vicinity of this path. 

Entropy plays an important role in free energy surfaces 
as it can wash out potential energy minima and make it such 
that finite temperature equilibrium states do not correspond 
to minima in the potential energy surface. Nevertheless, for 
systems with deep minima in the potential energy surface, 
one can use algorithms that locate all the minima on the K 
(potential) energy surface and assume that the properties at 
every point on this surface are the same as those of the nearest 
local minima safe in the knowledge that the entropic effects 
are small. These algorithms allow one to divide up config- 
uration space and map every point to its appropriate local 
minimum using a minimization algorithm (36). Furthermore, 
the slow modes in the vicinity of each basin provide good local 
collective coordinates as in the vicinity of basins the system 
is very nearly harmonic. From a patchwork of such descrip- 
tors one could conceive of ways to obtain globally-non-linear 
CVs. Recently, Kushima et. al. (37) have developed a self- 
learning, metadynamics-based algorithm based on these ideas 
that works by minimizing the energy and adding bias func- 
tions at the position of the minimum found so that subsequent 
minimizations will identify new minima. 

Ideas described above for the exploration of potential en- 
ergy surfaces cannot be straightforwardly transferred to the 
study of free energy surface because of the difficulties associ- 
ated with the calculation of derivatives at finite temperature. 
Nonetheless, Maragakis et al (29) have developed a method, 
GAMUS, that has some similarities to the K method devel- 
oped by Kushima et al. In GAMUS the kinetic traps that are 
preventing free diffusion are located by fitting the probability 
distribution of visited configurations with a Gaussian Mixture 
(GM) model. The resulting set of bespoke Gaussians are then 
used to update an adaptive bias that encourages the system to 
visit unexplored regions of configuration space. This adaptive 
approach accelerates the filling of basins and thus provides a 
considerable speed up over conventional metadynamics when 
one is using 3 or 4 collective variables. Nevertheless, the filling 
time will still increase exponentially with the number of CVs. 





Fig. 2. Schematic representations of why it is necessary to use more than one 
pea analyzer and why it is necessary to expand basins. The trajectory in panel (a) 
can be fit using a single pea analyzer. By contrast the trajectory in panel (b) must 
be fit with a pair of analyzers as the energetic barrier is low enough that there will 
be hopping events between the two sub-basins. Panel (c) shows how adding a single 
Gaussian to the center of the basin in panel (a) can lead to the creation of spurious 
basins in later cluster analyses. Panel (d) demonstrates how this problem becomes 
more severe as the dimensionality is increased and also that it will occur if the basins 
are given a fixed size. 



Reconnaissance metadynamics algorithm 

Reconnaissance metadynamics combines a number of new 
ideas with those of established methods and is thus effective 
with very large numbers of CVs. The bias potential is con- 
structed in terms of a patchwork of basins each of which cor- 
responds to a low free energy region in the underlying FES. 
These features/basins are recognized dynamically by periodi- 
cally analyzing the trajectory with a sophisticated clustering 
strategy. The region of configuration space in the vicinity 
of each basin is then described using a one-dimensional CV 
that is tuned using information collected during the cluster- 
ing. Consequentially, even when the overall number of col- 
lective coordinates (d) is large, depressions are compensated 
for rapidly because the bias is added in a locally-valid, low- 
dimensional space. 
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Fig. 1. Flow chart for the reconnaissance metadynamics algorithm. 



Cluster analyses are performed at regular intervals using a 
set of stored configurations for the CVs that are accumulated 
from the trajectory. During this analysis it is essential that 
some form of dimensionality reduction be performed as oth- 
erwise the fitting will be intractable. In addition, one must 
recognize that, because this is a dynamical trajectory, the 
system may well have hopped between different basins on the 
free energy surface (see figure ). Therefore, because we would 
ideally like to treat each basin separately, principle compo- 
nent analysis (PCA) is not an option. Furthermore, Gaussian 
mixture expectation maximization algorithms (29), although 
able to separate the basins, will become unstable when the 
number of collective coordinates is large. Thankfully however 
combinations of these two algorithms exist (38-40) that al- 
low us to cluster the data while simultaneously reducing the 
dimensionality. 

This clustering strategy provides us with a set of Gaus- 
sian centers (/i) and covariance matrices (C) for the various 
basins in the free energy surface. Some of these will have very 
low weights or will be very similar to previously encountered 
basins and can thus be safely discarded. Those remaining 
provide useful information on the local topology of the FES 
but cannot be used to predict the actual depth of the basin or 
its shape away from the center. Consequentially, a flexible bi- 
assing strategy must be used in the vicinity of the minimum as 
addition of a single Gaussian will not necessarily compensate 
for the depression in free energy. This failure to compensate 
basins fully can lead to the formation of spurious low energy 
features in the region surrounding the basin center (see figure 
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(c)), which is a problem that becomes more severe as the di- 
mensionality is increased. To resolve these issues we assume 
that the basin is spherically symmetric in the metric induced 
by C and, in the spirit of metadynamics, construct an adap- 
tive bias composed of small Gaussian hills of height u>i and 
width Ar, along a single, radial collective coordinate r(s) (see 
equation 1). 

_( r(s )_ ri )2 

Vi(s) = w t e 2A^ [1] 
r(s) 2 = ( S -nfC-\ S -n) [2] 

In the above s is a vector that denotes the position in the full, 
d-dimensional CV space. The form of equation 1 means that 
the bias associated with each hill acts in a hyper-ellipsoidal- 
crust-shaped region in the full, d-dimensional CV space. Each 
of these crusts has a shape much like a layer in an onion and 
so the integral of the bias added increases with r(s). Conse- 
quentially, the filling time for each basin no longer depends 
exponentially on the number of collective coordinates. Fur- 
thermore, the hills have a shape that is consistent with the 
underlying anisotropy of the free energy surface because of 
the use of the covariance matrix in Eq. 2. 

Obviously the distance from a basin center is only a good 
collective coordinate when we are close to that center and so 
each basin must have a size, S. This size assigns the region of 
configurational space in which it is reasonable to add hills that 
will force this system away from a particular basin. Setting an 
initial value for this size is straightforward as we know from 
our fitting that the basin's shape is well described by a mul- 
tivariate Gaussian. Therefore, we choose an initial size equal 
to So = yd — 1 + 3 because, as shown in the supplementary 
information, when the angular dependency of a d-dimensional 
Gaussian is integrated out the resulting distribution of r is ap- 




Fig. 3. Contour plots showing the potential energy + the current bias at selected 
points along a Reconnaissance Metadynamics trajectory for a particle diffusing about 
a 2D potential energy surface. The black dots indicate the positions of the snapshots 
accumulated from the trajectory while the red ellipses indicate the basins found using 
the PPCA algorithm. Blue ellipses are those basins, found during previous PPCA 
analyses, to which hills are being added. The expansion of these blue basins as the 
bias grows is clearly seen in this figure. 
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proximately equal to a ID-Gaussian with standard deviation 
V2 centered at \/d — 1. 

If the size of basins is fixed then the problems described in 
figure are encountered once more. Hence, in reconnaissance 
metadynamics the basin's size is allowed to expand during 
the course of the simulation so as to ensure that spurious 
minima, that would otherwise appear at the edge of basins, 
are dealt with automatically. Periodically we check whether 
or not the system is inside the hyper-crust at a basin's rim 
(S < r(s) < S + Ar) and also that it is not within the sphere 
of influence of any other basin. If these conditions are satisfied 
we then decide whether or not to expand using a probabilis- 
tic criterion, which ensures that it becomes more difficult to 
expand basins as the simulation progresses. Based on the 
loose analogy with free particle diffusion outlined in the sup- 
porting information, this probability for expansion is given by 
P = min(l, ) i where S is the current size of the basin, At 
is the time between our checks on whether or not to expand 
and D is a user defined parameter. 

The algorithm is summarized in the flow chart in figure 
1. Further details on the components of the algorithm can 
be found in the methods section and in the supplementary 
information. 



Results 

2D-surface. To illustrate the operation of the algorithm we 
first show how it can be used to accelerate the (Langevin) dy- 
namics on the model, 2D potential energy surface illustrated 
in figure . 

At low temperatures a particle rolling about on the sur- 
face shown in figure will remain trapped in one of the deep 
basins. This is precisely what is observed during the first part 
of the simulation, when no metadynamics is performed, as 
the first application of our clustering algorithm demonstrates. 
On addition of bias, the system quickly escapes this first basin 
and falls into other basins, the locations of which are identi- 
fied during subsequent applications of our unsupervised learn- 
ing protocol. This process continues throughout the simula- 
tion so, once basins are identified, the history dependent bias 
compensates for them quickly and hence the system rapidly 
explores the entirety of the energy surface. 

Figure shows that the reconnaissance metadynamics al- 
gorithm, when properly applied, finds only basins that cor- 
respond to the true features in the free energy surface. In 
addition figure (d) shows how effective the adaptive bias is in 
dealing with regions where small basins are encompassed in 
larger depressions. It clearly shows that initially small hills 
are used to deal with the sub-basins, while later much larger 
hills are used to compensate for the super-basin. 

Lennard-Jones 7. Small clusters of rare-gas atoms have a re- 
markably complex behavior despite their rather limited num- 
ber of degrees of freedom. A particularly well studied exam- 
ple is the two-dimensional, seven-atom, Lennard-Jones clus- 
ter (41,42) for which 4 minima and 19 saddle points in the po- 
tential energy surface have been identified (43) (see figure 4). 
At moderate temperatures (fcsT = O.le) this system spends 
the majority of its time oscillating around the minimum en- 
ergy structure, in which one of the atoms is surrounded sym- 
metrically by the six other atoms. Infrequently however the 
system will also undergo isomerizations in which the central 
atom of the hexagon is exchanged with one of the atoms on 
the surface (43). 

To test the reconnaissance metadynamics algorithm we 
examined this system using the coordination numbers of all 
the atoms (seven collective coordinates). In doing this we ne- 
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Fig. 4. The locations of the various basins found during a reconnaissance metadynamics simulation of the Lennard-Jones 7 cluster plotted as a function of the second and 
third moments of the distribution of coordination numbers and the index of the atom with the highest coordination number. Also shown is a free energy surface calculated 
using a well-tempered metadynamics simulation that employed the second and third moments of the distribution as collective coordinates. Each basin is represented by a circle 
with an area that is proportional to the bias at its center. On the free energy surface contours are placed at intervals of 0.1 e 



gleet the interchange symmetry to see if we can reproduce 
this symmetry in the positions of the various basins we find. 
Figure 4 gives the results of this calculations and denotes the 
position of each of the basins found by a circle whose area 
reflects the final bias at the basin center. The three lowest 
lying minima for this cluster have one atom which has a much 
higher coordination number than the others and so the basins 
identified along the trajectory have been grouped, based on 
the index of the atom with the highest coordination number, 
onto seven slices. On each of these planes the positions and 
sizes of the basins are consistent, which suggests that the algo- 
rithm finds the correct symmetry and explores configuration 
space correctly. 

In reconnaissance metadynamics there is no straightfor- 
ward connection between the bias and the free energy because 
each CV is only valid in a local region and hence the overall 
bias is not a function of a global order parameter. Nonethe- 
less, the bias greatly enhances the exploration of phase space 
and so free energies could be obtained using an umbrella sam- 
pling approach. However, even without this additional step, 
a reconnaissance metadynamics trajectory gives one a feel for 
the lie of the land, which can be used to obtain chemical 
insight. For example, the basin centers provide a set of land- 
mark points that can be used to validate a low dimensionality 
description of the system (27). For this simple case we did 
this by calculating the value at the basin centers of many dif- 
ferent candidate coordinates. We discovered that an optimal, 
in-plane separation of the various basins is attained when we 
use the second and third moments ( = -h X^ili( c * — ( c )) 2 
and /x| = i Si=i ( c * — ( c ) ) 3 respectively) of the distribution of 
coordination numbers. These two CVs clearly project out per- 
mutation symmetry and so we also performed a conventional 
well-tempered metadynamics simulation. Figure 4 shows the 
free energy energy surface obtained in its eighth slice. A com- 
parison of this surface with the results from the reconnaissance 
metadynamics shows how the basins found cluster around the 
minima in this FES. 



Poly-alanine 12. The protein folding problem is commonly 
tackled using computer simulation and there exist model sys- 
tems for which the entirety of the potential energy landscape 
has been mapped out (36) that represent a superb test of any 
new methodology. For example polyalanine-12, modeled with 
a distance dependent dielectric (ey = ry in Angstroms) that 
mimics some of the solvent effects, has been extensively stud- 
ied (44). This protein has a funnel-shaped, energy landscape 
with a alpha-helical, global minima. We found that during 
a 1 /is, conventional MD simulation started from a random 
configuration, the protein did not fold (see supplementary in- 
formation). Hence, examining whether or not the protein will 
fold during a reconnaissance metadynamics simulation will 
provide a third test of our methodology. 

For this reconnaissance metadynamics calculation we used 
the 24 backbone dihedral angles as the collective coordinates 
as these angles provide an excellent description of the pro- 
tein structure. These variables are periodic, which had to 
be accounted for in the method by replacing the multivari- 
ate Gaussians with multivariate von Mises distributions (45). 
This distribution, if sufficiently concentrated about the mean, 
is equivalent to a Gaussian in which the difference between any 
point and the mean is shifted to the minimum image. Con- 
sequentially, we can continue to use the same algorithm for 
trajectory analysis as long as we take into account the period- 
icity when we calculate differences and averages. In addition, 
we can define a quantity (see equation 3) that is equivalent 
to the distance from the center of the basin (equation 2) but 
that takes into account the periodicity of the CVs (Pi)- 
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Figure 5 provides a representation of a portion of a typ- 
ical reconnaissance metadynamics trajectory of the protein. 
The figure shows the values of all the backbone torsional an- 
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Fig. 5. A representation of a 50 ns portion of the reconnaissance metadynamics trajectory for alanine 12 modeled with a distance dependent dielectric. The bars in the lower 
panel give the values of 40 ps running averages for each of the torsional angles in the protein (see key). The red (light) line shows a similar running average for the bias potential. 
The system was annealed every 20 ps to examine the transitions between inherent structures and the energies obtained are indicated by the blue (dark) line. A number of 
low-energy, representative structures found during the trajectory are also shown and the box highlights the portion of the trajectory when the protein has a configuration that 
is near the global-minimum, alpha-helix structure. 



gles and clearly demonstrates that a large volume of config- 
urational space is explored during this relatively-short sim- 
ulation. Furthermore, unlike in conventional MD, after ap- 
proximately 22 ns the protein folds into the global-minimum, 
alpha-helix configuration. To further demonstrate that the re- 
connaissance metadynamics is ensuring that large portions of 
configurational space, that would otherwise not be visited, are 
being explored we calculated the inherent structures by mini- 
mizing the energy every 20 ps. The energies of the minimized 
structures are shown in the uppermost panel of figure 5 and 
demonstrate that the potential energy surface is very rough 
and that the alpha helix is considerably lower in energy than 
the other energy minima encountered. 



Conclusions 

We have proposed a new accelerated dynamics scheme, recon- 
naissance metadynamics, that uses a self-learning algorithm to 
construct a bias that accelerates the exploration of configura- 
tion space. This algorithm works by automatically identifying 
low free energy features and deploying a bias that efficiently 
and rapidly compensates for them. The great advantage of 
this algorithm is that there is no limit on the number of CVs 
on which the bias acts. This is only possible because we col- 
lapse all these CVs into a single collective coordinate that is 
only valid locally and patch together a number of these lo- 
cal descriptors in order to reflect the fact that the important 
degrees of freedom are not uniform throughout configuration 
space. As shown in figure this approach provides us with 
a simple, compact, hierarchical description of the free energy 



surface, which could be used to construct bias potentials for 
umbrella sampling. 

As shown for the two chemical systems on which we have 
demonstrated the algorithm, our ability to work with large 
numbers of collective coordinates means that one can employ 
generic configurational data, such as torsional angles and coor- 
dination numbers, as collective coordinates and thereby avoid 
all the usual difficulties associated with choosing a small set 
of collective coordinates. In fact, even when CVs, on which 
there are no large barriers to motion, such as the torsional an- 
gles on the terminal amino acid groups in alal2, are included 
the algorithm will still function. For both systems our method 
produces trajectories that contain an extensive exploration of 
the low energy parts of configurational space and hence pro- 
vides a feel for the lie of the land. Furthermore, even without a 
quantitative estimate of the free energy, considerable insight 
can be obtained from the trajectory. In the Lennard Jones 
this allowed us to attain an effective two dimensional descrip- 
tion of the landscape. For more complex systems non-linear 
embedding could be used to automate this procedure. 

Materials and Methods 



Mixture of probabilistic principle component analyzers. Throughout this work 
we use the annealing strategy outlined in reference (40) and in the supplementary 
information to do clustering. This algorithm requires one to state at the outset the 
number of clusters that are being used to fit the data and the number of annealing 
steps. For the latter we initially set cr 2 , which is the quantity treated like the temper- 
ature during the annealing, equal to the maximum eigenvalue of the covariance of the 
data and lowered cr 2 until it was less than 1 % of its initial value. To establish the 
correct number of clusters we run multiple fits to the data using different numbers 
of clusters and select the fit that gives the largest value for the Bayesian Informa- 
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tion Criterion (46), BIC=2 log[£(x,0)]-n p log[Af], where C(x,6) is the maximized 
likelihood for the model, n p is the number of parameters in the model and M is the 
number of trajectory snapshots used in the fitting. 

Selecting novel basins. As already discussed the GM algorithm provides us with 
the locations of a number of basins. Some of these will have very low weights in the 
fit and can therefore be safely ignored in the construction of the bias. Others however 
will provide information about the basins found during prior runs - in short information 
that is redundant. Therefore, we introduce a criterion for the selection of basins that 
requires that j\ [l-max(^j )]>TOL, where TOL is some user defined tolerance, f\ 
is the weight of the new basin in the fit and £^ is the similarity between the new 
basin i and the old basin j. To calculate £ f j we use Matusita's measure which can be 
calculated exactly for multivariate Gaussians (47). For an expanded basin of size S 
its original covariance is multiplied by a factor of S/S when calculating this function 
so that their expanded volume is appropriately taken into account. 

Lennard-Jones. The parameters for the simulations of Lennard-Jones 7, in 
Lennard Jones units are as follows. The temperature was set equal to k B T=0.1e 
using a Langevin thermostat, with a relaxation time of 0.1 y e/ma 2 . The equa- 
tions of motion were integrated using the velocity verlet algorithm with a timestep 
of 0.01 yj e/ma 2 for 5 x 10 7 steps. During reconnaissance metadynamics the 
CVs were stored every 100 steps, while cluster analysis was done every 1 x 10° 
steps. Only basins with a weight greater than 0.3 were considered and to these 
attempts to add hills of height 0.5 k B T and a width of 1.5 were made every 
1000 steps. Basin expansion was attempted with the same frequency with the 
parameter D set equal to 0.03. The coordination numbers were computed using 



c i=J2i^j 1- (t4) 8 ^"("ri) 16 ] ■ wnere r ij ' s tne distance between atoms i 
and j. 

poly-alanine-12. All simulations of polyalanine were run with a modified version of 
gromacs-4.0.3 (48), the amber96 forcefield (49) and a distance dependent dielectric. 
A timestep of 2 fs was used, all bonds were kept rigid using the LINCS algorithm and 
the van der Waals and electrostatic interactions were calculated without any cutoff. 
The global thermostat of Bussi et al (50) was used to maintain the system at a tem- 
perature of 300 K. The initial random configuration of the protein was generated by 
setting up the protein in a linear geometry, minimizing it and then running 1 ns of 
normal MD at 300 K in order to equilibrate. During reconnaissance metadynamics 
the CVs were stored every 250 steps, while cluster analysis was done every 5 x 10' 
steps. Only basins with a weight greater than 0.2 were considered and attempts were 
made every 1000 steps to add to these basins hills of height 0.4 k B T and width 1.5. 
Basin expansion was attempted with the same frequency with the parameter D set 
equal to 0.3. Minimizations to obtain inherent structures was done by first annealing 
for 1.2 ns with a decay rate of 0.996 ps 1 and subsequently performing a steepest 
decent minimization. Guidelines as to how to select parameters for reconnaissance 
metadynamics are provided in the supplementary information. For both this system 
and the Lennard Jones cluster we obtained similar results with a variety of different 
parameters sets. 
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In the following the mathematical details for the trajectory analysis tools used in the reconnaissance meta- 
dynamics algorithm are provided along with some information on how to select parameters. In addition, we 
Q_i also provide a collection of the low energy structures we found for alal2 and an analysis of a conventional MD 
trajectory for this system. 
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1 Mathematical details 

As discussed in the paper the algorithm we use for clustering is taken from references [1, 2]. In the following 
some background information on this algorithm is provided, as some of the concepts behind it are perhaps 
Oh unfamiliar to those working in the molecular simulation community. 

a 

O l.l Expectation-Maximization 

c/3 An expectation maximization algorithm is a method for fitting data to a statistical model. To understand it 
. consider a collection of m points, s TO - these could come from a dynamical trajectory. One can easily calculate the 
probability that these points came from a particular statistical distribution, P(s m ) by computing the following: 

(— | M 

dh c = n (i) 



> 



This quantity is called the likelihood of the model and is often quoted in terms of its logarithm as typically it 
will be very small. If instead of having a fixed probability model one has a model that depends on some set of 
parameters, or latent variables 6, then the likelihood will clearly be a function of these parameters: 

<N M 

C(0)=l[P( Sm ,0) (2) 

m— 1 



X 



o 

Obviously, when fitting the parameters of the model one would like to maximize this likelihood. Commonly this 
t-H is done using an expectation maximization algorithm. This iterative algorithm works by performing alternate 
expectation and maximization cycles. In the expectation (E) step the values for the probabilities of all the 
points are calculated from the current estimate of the probabilistic model. These probabilities can then be used 
to compute the likelihood of the model. In addition the calculated probabilities are used in the maximization 
step to recalculate the parameters of the model. Re-performing the E-step with the new estimate of the model 
parameters gives an improved likelihood and hence this process of alternate E and M steps eventually brings 
one to a maximum-likelihood solution. 
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1.2 Gaussian mixture models 



The Gaussian mixture (GM) model [3] is one of the simplest examples of classification by unsupervised learning 
and is the algorithm used in the G AMUS methodology [4] . This algorithm aims to fit a probability distribution 
composed of TV multivariate Gaussian distributions in d dimensions using an input set of M points. This 
is achieved by fitting the centers fi n , covariances S„ and weights /„ of the Gaussian distributions using an 
expectation maximization algorithm. During the E-step the probability for each point in each of the Gaussian 
distributions is calculated using: 



Pn(&rn) 



-pexp 



(27r)f|£ n |i 

where is the determinant of £„. The likelihood of the model is then computed using: 



N 



n=l 
M 

C = Y[P(s m ) 



(3) 



(4) 



(5) 



Having computed this probability data one can then perform the M-step in which new parameters for the 
probability model are calculated using: 



A I 



fJ-n = 
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(7) 
(8) 



m— 1 



Iterating these two steps leads one to a maximum in the likelihood function. 



There are two major disadvantages of this algorithm. The first is that the number of parameters that must 
be fitted increases quadratically with the number of dimensions. Consequentially it is only possible to use 
Gaussian mixture models with low dimensionality data. The second major problem is that it is not obvious 
how to initialize the parameters during the expectation step. This is important because there are many local 
maxima of the likelihood and therefore different start points will lead to different solutions. A common way of 
sidestepping this issue is simply to run the algorithm multiple times and to pick the best fit of the data. 



1.3 Deterministic annealing 

Often, one's concern is hard clustering; namely, classifying a set of data points into a small number of clusters 
based on the spatial distribution. Obviously if this is the aim then one is not so concerned with fitting the data 
to a probability distribution using the methods described in the previous section. Nevertheless hard-clustering 
algorithms work much like the EM algorithm described in the previous section [3]. Again though a problem 
with these algorithms is the presence of multiple local maxima in the likelihood function. Hence, starting the 
optimization with different initial conditions will lead to different solutions. 
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For these hard clustering problems Rose et al [5] have come up with an elegant solution to the initialization 
problem that is based on using an annealing strategy. This method works by doing multiple fits of the data to 
a probability distribution that is composed of N spherical Gaussian distributions. At each stage they fit the 
weights and centers of the Gaussian distributions. Meanwhile, the standard deviation, a, of the distribution is 
treated like a temperature and is thus exponentially decreased as the annealing progresses. For each value of a 
the fitting is done using an expectation maximization strategy in which, during the E-step, the probabilities of 
the data in each of the Gaussian distributions is computed using: 



and the likelihood is computed using: 



1 



(27rcr)i 



exp 



1 

2^2 



N 
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M 



(9) 

(10) 
(11) 



m— 1 



Then during the M step the weights and centers of the distribution are re-computed using equations 6 and 7 
respectively. 

The advantage of the annealing is that it is less likely to get trapped at local maxima of the likelihood surface. 
Furthermore, finding an initial start point is straightforward because, for high values of a, all the clusters will 
coalesce into a single cluster. Hence, the calculation can be initialized with a configuration in which all the 
cluster centers are placed at the center of mass of the data. In fact Rose et al [5] have demonstrated, through an 
analogy with statistical mechanics, that as a (which is the analogue of temperature) is lowered the system will 
undergo a sequence of phase transitions. During each of these phase transitions Gaussian distributions, which 
prior to the transition would merge on a single center, split apart in order to reflect the clustered structure 
inherent in the data. 



1.4 Probabilistic principle component analysis 

Principle component analysis (PC A), or essential dynamics, is a popular technique in the field of molecular 
simulation. It is commonly used to obtain directions in which there is maximal change in some particularly 
interesting set of collective coordinates be they the coordinates of particular atoms [6], the dihedral angles or 
the distances between particular atoms or groups of atoms. In PCA one computes the mean and covariance 
of the collective coordinates from a short MD trajectory. One then diagonalizes this covariance and projects 
the molecular motions onto the eigenvectors with the highest eigenvalues. This is useful as it gives one a sense 
for directions in which the structural changes are maximal. However, it is important to realize that these 
directions of maximal structural change are only valid in the small, locally-ergodic region which was explored 
during the unbiassed MD trajectory [7]. In other words, in unexplored regions of configurational space there 
is no guarantee that the principle components, calculated using essential dynamics, are the directions in which 
the structural changes are maximal. Consequentially, one would ideally like to fit the size of the region over 
which the calculated principle components are valid, which would obviously be related to the region of phase 
space explored during the course of the trajectory. 

In the machine learning literature PCA algorithms exist that are able to fit both the principle components and 
the region of space over which these components are valid [8] . These so called probabilistic principle component 
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analysis algorithms (PPCA) work by assuming that the data is distributed in (i-dimensional space according to 
a multivariate Gaussian distribution: 



P(s m ) = j r ex P 

(27r)i|C|5 



■ ^ (s m /i-n) C (s m Mn) 



(12) 



Here the covariance, C, is given by: 

C = ct 2 I + WW t (13) 

where W is a (d x g) parameter matrix that describes the distribution in the directions in which there are are 
large fluctuations. The fluctuations in the remaining directions are assumed less important. Hence, they are 
dealt with in an averaged way by assuming that in these directions the data can be modeled using a spherical 
Gaussian with a standard deviation of a. 

The existence of this probability distribution allows one to define a likelihood for the data which is given by: 

M 

log£= ^logP(s m ) = -— [dln(27r) +ln|C|+Tr(C- 1 S)] (14) 

m— 1 

where S is the covariance of the (i-dimensional data, which can be easily computed. Stationary points for this 
likelihood can be computed and it emerges that the global maximum for the likelihood occurs when W is given 
by: 

W = U 9 (A 9 - ( t 2 I)2 (15) 

where the q column vectors in the (d x q) matrix U g are the q principle eigenvectors of the covariance and 
A q is a diagonal matrix that contains the corresponding eigenvalues. The maximum likelihood estimate of a 2 
meanwhile can be calculated as the average of the remaining eigenvalues. Alternatively one can fix a 2 at the 
outset and can set q, the number of principle components in W, equal to the number of eigenvectors whose 
corresponding eigenvalues are greater than a 2 . 

1.5 Mixtures of probabilistic principle component analyzers 

As discussed in the previous section in the probabilistic version of principle component analysis (PPCA) [8] 
there is a probabilistic model for the observed data. What is more this probability distribution is a multivariate 
Gaussian. This suggests that one can combine the PPCA framework with the GM model and fit the data 
as a mixture of probabilistic principle component analyzers [9] thereby reducing the number of parameters 
required for clustering in high dimensionality. There are only two differences in the resulting algorithm from 
that described above. The first is that during the E-step, the C matrix obtained from the principle components 
must be used in computing the probabilities of the points rather than the true covariance, S, of the data: 



P n {s m ) — a i exp 
(2tt)2 |C„|2 



2 l^n) C n (s m fJ"n) 



(16) 



The second difference is that after updating the weights, means and covariances of the Gaussians, using equations 
6, 7 and 8 respectively, one must diagonalizc the covariance matrices so as to extract the principle eigenvalues 
and eigenvectors. These can then be used to recompute C n : 

W„ = U„, 9 (A n , 9 -a 2 I)h (17) 
C„ = a 2 I + W„W^ (18) 

where the d x q matrix U n ,q contains the q principle components of the covariance matrix £„ and A q is a 
diagonal matrix that contains the corresponding eigenvalues. 
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1.6 A simulated annealing algorithm for PPCA clustering 

The algorithm used in this work for doing clustering [1, 2] combines all of the elements discussed in the 
previous sections. Hence, we are using the mixture of PPCA analyzers discussed above but are using an 
annealing strategy to do the fitting rather than relying the EM algorithm alone. The resulting algorithm 
reads: 

Place all clusters at the center of mass of the data set. 
Set a 1 equal to maximum eigenvalue of covariance A max . 
while Negligible separation between two cluster centers do 

Perturb position of the center of all clusters. 

Perform EM minimization with spherical covariance. 

a <— aa 
end while 

while a 2 > 0.01A max do 

Perform EM using equations 1-6 



In the early phases one performs a deterministic annealing, which continues until all the clusters are well 
separated. Then, in the final stages of fitting, one uses the mixture of probabilistic principle component 
analyzers. For each E-M step, the number of principle components q is set equal to the number of eigenvalues 
of the covariance matrix that are larger than the current value for a 2 . 

2 Some useful properties of multivariate Gaussians 

As discussed we use some known properties of multivariate Gaussians in the algorithm. Firstly, we use the fact 
that integrating out the angular distribution in this function gives the following dependence of the distribution 
on the distance from the mean in the metric of the covariance, r: 



Here S(d, C) indicates the surface integral of a hypersphere of dimension d transformed by a orthogonal matrix 
C. The result in the final step, that the distribution as a function of r is approximately a Gaussian centered at 
y/d — 1, can be proved by comparing term by term the series expansions about y/d — 1 for the two functions. 
Matusita's measure for two multivariate Gaussian can be computed explicitly as: 



a <— aa 
end while 




(19) 




c i i r i 

;rjT— ex P -4 (Mi - Mj) T ( C i + Cj) _1 (^i - Mj) 



(20) 



where /x denotes the position of the Gaussians center and C its covariance. 
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3 Selecting parameters for reconnaissance met adynamics 



When running a reconnaissance metadynamics trajectory the user must select values for 7 different parameters. 
Three of these are the standard metadynamics parameters; namely, the height of the hills, the width of the hills 
and the frequency of addition. The remainder deal with the way the adaptive bias is generated by controlling the 
expansion of basins, establishing how often one should apply the clustering protocol and determining which of 
the pieces of information obtained from the clustering algorithm should be used in generating CVs. Guidelines 
on the setting of each of these parameters is given below. 

3.1 Frequency of storage 

During the trajectory one must periodically collect the values of the collective coordinates as these collected 
values will be used in the trajectory analysis. Obviously, it is not sensible to store the collective coordinates at 
every timestep as the points collected would be very highly correlated. Instead, one should allow some short 
period of time to pass prior to collecting statistics as then the extent of correlation will be smaller. In setting 
this time one can use the autocorrelation functions to examine what are the typical timescales over which 
correlations persist. 

3.2 Frequency of PPCA 

The frequency of PPCA will be determined by the number of points you wish to include in the PPCA analysis. 
Setting this parameter is a compromise between the accuracy of the fitting, which will increase with the number 
of points and hence the time taken to collect the statistics. In typical applications the time taken for the analysis 
is insignificant compared with the cost for the dynamics. 

3.3 Weight tolerance 

As discussed in the paper it is only those clusters that do not overlap with the already present basins which 
have weights greater than some user defined tolerance that are used to create the bias. Selecting this tolerance 
is easy, as in considering how to set it one can neglect the effect of overlap. This means that the tolerance just 
determines what fraction of the trajectory must be in a basin in order for the user to consider it a significant 
kinetic trap. 

3.4 Hill height and frequency of addition 

The accuracy of any metadynamics simulations and how it depends on the heights of the hills and the frequency 
of addition has been studied elsewhere [10, 11]. Although in reconnaissance metadynamics we are doing some- 
thing slightly different in terms of hill addition there is no reason to believe that the usual rules do not apply 
in this case. Consequentially, so as to not introduce too many discontinuities to the free energy as a function 
of time we frequently attempt to add hills that have a height lower than fc^T rather than infrequently adding 
very large hills. 

3.5 Hill width 

The PPCA algorithm provides a measure of the shape of the basin, which the onion-shaped hills are designed 
to reflect. As such in calculating the distance r(s) we employ a metric that is equal to the inverse of the 
covariance matrix. This ensures that for those directions in which there are large fluctuations the hills are 
wide, while in directions in which the fluctuations are small the hills are narrow. This locally adaptive hills 
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shape setting ensures that the bias compensates for the free energy rapidly. Furthermore, as shown below, it 
is relatively straightforward to calculate from the covariance matrix the average fluctuations in r(s) so that a 
sensible width, Ar^ for the Gaussians can be selected. With no loss of generality, in the following we assume 
(i = and (ss T ) = C. 

a[r 2 (s)] = yJ(r( S )*) (r(s) 2 f (21) 
(r(s) 2 ) = (s-C-s) 

= Tr(C- 1 S s T )=Tr[C- 1 (ss T )] 

= Tr [C^C] = d (22) 
(r(s) 4 ) = (s T c- 1 ss T c- 1 s ) 

= C al C ^5 [(sosrf(s 7 Sj> + (sss„){s,3S 7 ) 

+ (SnSjfs^Sj)] (23) 

= '-'afl^-yS [CapC^S+CsaCpy+CayCps] 

= d 2 +d + d = d 2 +2d (24) 

Here we have used the Einstein summation convention for sums in the later parts and have have used a well- 
known equality for the average of products of Gaussian samplcs[12]. Substituting equations 22 and 24 into 
equation 21 we obtain: 

cr[r 2 (s)] = Vd 2 + 2d - d 2 = V2d (25) 
The standard deviation for r(s) can then be obtained using a series expansion as shown below: 

r(s) = /« + «'»~/(>{) + «('V(>{) 

" rW1 - v^ ,r(s)21 = ^ = ^ (26) 

In metadynamics (and hence reconnaissance metadynamics) the hills should have a width that is a small multiple 
of this standard deviation. 

3.6 Expansion parameter 

As discussed in the text during reconnaissance metadynamics simulations we must assign a size to the various 
basins we find from cluster analysis and these sizes change as the simulation progresses. Obviously, as basins 
get larger it must become more difficult to grow further as we do not want basins to grow forever. We therefore 
use a probabilistic criterion for controlling basin expansion that we derive using some ideas from the theory of 
diffusion. 

The solution of the Fokker Plank equation for diffusion in d dimensions away from the origin is given by: 

P(s,t)= J^ e*p(~) (27) 



^(2irDt) d V 2£>i,/ 

From this expression we can derive an equation for the average distance between the particle and the origin, 
(r) and the rate of change of this distance: 

, > / x / — d(r) 1 l~D D 

(r)(t) = VDi -> -AJ. = -W— 28 
ww dt 2 V t 2 r v ' 
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From this final expression we can obtain an approximate equation for the time taken for the average distance 
between the particle and the origin to change by a factor of Ar: 



At' = 



2Ar(r) 



(29) 



D 



This expression provides us with a measure for the time it would take the average value of the distance from 
the origin to increase by a factor of Ar were the system diffusing about on a flat surface. This can therefore be 
used as a speed limit for basin expansion as basins shouldn't expand faster than the system can diffuse away 
from from the center of the distribution. In other words, we would like the final basin sizes to reflect the fact 
that the system was at one time trapped at the outside of a basin and not to reflect the fact that diffusion on 
a flat free energy surface takes time. Consequentially, taking the ratio between the time between hill additions, 
At, and this characteristic time for diffusion provides us with our probabilistic term for basin expansion: 



where S is the current size of the basin. 

Obviously, it is not possible to obtain an a-priory estimate for the value of the diffusion constant D in the above 
expression and thus this parameter must be set by trial and error. If it is set too small then one will observe 
that the basins never expand and thus many spurious basins will be observed. By contrast if it is set too large 
the basins will never stop expanding. 

4 Supplementary information: Lennard-Jones 7 

4.1 Well-tempered metadynamics simulations 

Well tempered metadynamics was used to obtain the free energy surface of the Lennard Jones cluster that is 
shown in figure 4 in the text. Four well-tempered metadynamics simulation were run and then averaged so as 
to increase the speed of convergence of the free energy surface. In all these simulations T + AT, the notional 
temperature at which the simulation is carried out, was set equal to 1.0 ksT. The widths of the Gaussians 
were 0.02 for both CVs and in two of the simulations the heights of the Gaussians were 0.001 e while in the 
other two the heights were 0.0025 e. So as to prevent exploration of uninteresting parts of phase space harmonic 
walls were used with force constants of 1000 eCV~ 2 . These were placed at values of 0.4 and 1.2 in the second 
moment and at a value of -0.3 in the third moment. 

5 Supplementary information: alal2 

5.1 Simulating alal2 with unbiassed MD 

Figure SI shows a representation of a 1 /is portion of an unbiassed trajectory for alanine 12 modeled with a 
distance dependent dielectric. The bars in the lower panel give the values of 40 ps running averages for each of 
the torsional angles in the protein (see key). A number of representative structures found during the trajectory 
are also shown. The system was annealed every 400 ps to examine the transitions between inherent structures 
and the energies obtained are indicated by the blue line. It is important to note that the timescale in this figure 
is 20 times as long as the timescale for the reconnaissance metadynamics simulation reported in the paper. 
Hence, the figure shows that during a much-longer, unbiassed molecular dynamics simulation the system does 
not explore a large volume of configuration space and is trapped in a small region of phase space. In addition 
this figure shows that the alpha helix is not formed on the timescale shown in the figure. 




(30) 
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Figure 1: A representation of a 1 ixs portion of an unbiassed trajectory for alanine 12 modeled with a distance 
dependent dielectric. The bars in the lower panel give the values of 40 ps running averages for each of the 
torsional angles in the protein (see key). A number of representative structures found during the trajectory are 
also shown. The system was annealed every 400 ps to examine the transitions between inherent structures and 
the energies obtained are indicated by the blue line. 
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5.2 Quenched structures for alal2 



In addition to this document we also provide a pdb file that contains the quenched structures from a reconnais- 
sance metadynamics trajectory and their energies. We provide this information as it can serve as a benchmark 
for the development of new methods for exploring potential energy landscapes. 
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